AITopics

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry: Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou, Alexander A. Alemi

Watch Your Step: Learning Node Embeddings via Graph Attention

Neural Information Processing SystemsFeb-13-2026, 14:55:07 GMT

Neural Information Processing Systems http://nips.cc/

graph, random walk, representation, (14 more...)

Country:

North America > United States > California > Santa Clara County > Mountain View (0.04)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(5 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Neural Information Processing SystemsFeb-9-2026, 02:32:25 GMT

25d463c05b414125f598cdf8022b3b46-Paper-Conference.pdf

Attention Network (GA T), a popular GNN architecture in which a node's neighborhood aggregation is weighted by parameterized attention coefficients.

artificial intelligence, initialization, machine learning, (16 more...)

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Texas (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Cologne (0.04)

Genre: Research Report (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou, Alexander A. Alemi

Watch Your Step: Learning Node Embeddings via Graph Attention

Neural Information Processing SystemsNov-20-2025, 18:04:51 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, natural language, (17 more...)

Country:

North America > United States > California > Santa Clara County > Mountain View (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Neural Information Processing SystemsOct-10-2025, 04:43:10 GMT

BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

arxiv preprint arxiv, attention expert, dense model, (13 more...)

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry: Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Neural Information Processing SystemsOct-8-2025, 07:27:21 GMT

25d463c05b414125f598cdf8022b3b46-Paper-Conference.pdf

initialization, international conference, neural network, (14 more...)

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Texas (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Cologne (0.04)

Genre: Research Report (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceAug-16-2024

BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

Zhang, Qizhen, Gritsch, Nikolas, Gnaneshwar, Dwaraknath, Guo, Simon, Cairuz, David, Venkitesh, Bharat, Foerster, Jakob, Blunsom, Phil, Ruder, Sebastian, Ustun, Ahmet, Locatelli, Acyr

The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs from scratch in a large-scale regime is prohibitively expensive. Existing methods mitigate this by pre-training multiple dense expert models independently and using them to initialize an MoE. This is done by using experts' feed-forward network (FFN) to initialize the MoE's experts while merging other parameters. However, this method limits the reuse of dense model parameters to only the FFN layers, thereby constraining the advantages when "upcycling" these models into MoEs. We propose BAM (Branch-Attend-Mix), a simple yet effective method that addresses this shortcoming. BAM makes full use of specialized dense models by not only using their FFN to initialize the MoE layers but also leveraging experts' attention parameters fully by initializing them into a soft-variant of Mixture of Attention (MoA) layers. We explore two methods for upcycling attention parameters: 1) initializing separate attention experts from dense models including all attention parameters for the best model performance; and 2) sharing key and value parameters across all experts to facilitate for better inference efficiency. To further improve efficiency, we adopt a parallel attention transformer architecture to MoEs, which allows the attention experts and FFN experts to be computed concurrently. Our experiments on seed models ranging from 590 million to 2 billion parameters demonstrate that BAM surpasses baselines in both perplexity and downstream task performance, within the same computational and data constraints.

arxiv preprint arxiv, attention expert, dense model, (13 more...)

2408.08274

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.82)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Mustafa, Nimrah, Burkholz, Rebekka

GATE: How to Keep Out Intrusive Neighbors

arXiv.org Artificial IntelligenceJun-1-2024

Graph Attention Networks (GATs) are designed to provide flexible neighborhood aggregation that assigns weights to neighbors according to their importance. In practice, however, GATs are often unable to switch off task-irrelevant neighborhood aggregation, as we show experimentally and analytically. To address this challenge, we propose GATE, a GAT extension that holds three major advantages: i) It alleviates over-smoothing by addressing its root cause of unnecessary neighborhood aggregation. ii) Similarly to perceptrons, it benefits from higher depth as it can still utilize additional layers for (non-)linear feature transformations in case of (nearly) switched-off neighborhood aggregation. iii) By down-weighting connections to unrelated neighbors, it often outperforms GATs on real-world heterophilic datasets. To further validate our claims, we construct a synthetic test bed to analyze a model's ability to utilize the appropriate amount of neighborhood aggregation, which could be of independent interest.

dataset, neighborhood aggregation, node feature, (10 more...)

2406.00418

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Texas (0.05)
North America > United States > Wisconsin (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.34)

arXiv.org Artificial IntelligenceMar-19-2024

Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts

Somayajula, Sai Ashish, Liang, Youwei, Singh, Abhishek, Zhang, Li, Xie, Pengtao

Pretrained Language Models (PLMs) have advanced Natural Language Processing (NLP) tasks significantly, but finetuning PLMs on low-resource datasets poses significant challenges such as instability and overfitting. Previous methods tackle these issues by finetuning a strategically chosen subnetwork on a downstream task, while keeping the remaining weights fixed to the pretrained weights. However, they rely on a suboptimal criteria for sub-network selection, leading to suboptimal solutions. To address these limitations, we propose a regularization method based on attention-guided weight mixup for finetuning PLMs. Our approach represents each network weight as a mixup of task-specific weight and pretrained weight, controlled by a learnable attention parameter, providing finer control over sub-network selection. Furthermore, we employ a bi-level optimization (BLO) based framework on two separate splits of the training dataset, improving generalization and combating overfitting. We validate the efficacy of our proposed method through extensive experiments, demonstrating its superiority over previous methods, particularly in the context of finetuning PLMs on low-resource datasets.

attention parameter, dataset, task weight, (15 more...)

2403.12918

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Hampshire > Southampton (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Lawson, Daniel, Qureshi, Ahmed H.

Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies

arXiv.org Artificial IntelligenceSep-22-2023

Recent work has shown the promise of creating generalist, transformer-based, models for language, vision, and sequential decision-making problems. To create such models, we generally require centralized training objectives, data, and compute. It is of interest if we can more flexibly create generalist policies by merging together multiple, task-specific, individually trained policies. In this work, we take a preliminary step in this direction through merging, or averaging, subsets of Decision Transformers in parameter space trained on different MuJoCo locomotion problems, forming multi-task models without centralized training. We also demonstrate the importance of various methodological choices when merging policies, such as utilizing common pre-trained initializations, increasing model capacity, and utilizing Fisher information for weighting parameter importance. In general, we believe research in this direction could help democratize and distribute the process that forms multi-task robotics policies. Our implementation is available at https://github.com/daniellawson9999/merging-decision-transformers.

initialization, multi-task model, transformer, (14 more...)

2303.07551

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)